documentation/a nwn smart npc.tex

   1 %----------------------------------------------------------------------------------------------
   2 %                       nwn npc agent
   3 %----------------------------------------------------------------------------------------------
   4 \documentclass[a4paper,11pt]{article}
   5
   6 %\usepackage[portuguese]{babel}
   7 \usepackage[utf8]{inputenc}
   8 \usepackage[T1]{fontenc}
   9 \usepackage{graphicx}
  10 \usepackage{float}
  11 \usepackage{amsmath}
  12 \usepackage{geometry}
  13 \usepackage{listings}
  14 \usepackage{xcolor}
  15 \usepackage[colorlinks=true, urlcolor=blue, linkcolor=red]{hyperref}
  16
  17 \definecolor{codegreen}{rgb}{0,0.6,0}
  18 \definecolor{codegray}{rgb}{0.5,0.5,0.5}
  19 \definecolor{codepurple}{rgb}{0.58,0,0.82}
  20 \definecolor{backcolour}{rgb}{0.95,0.95,0.92}
  21
  22 \lstdefinestyle{mystyle}{
  23         backgroundcolor=\color{backcolour},
  24         commentstyle=\color{codegreen},
  25         keywordstyle=\color{magenta},
  26         numberstyle=\tiny\color{codegray},
  27         stringstyle=\color{codepurple},
  28         basicstyle=\ttfamily\footnotesize,
  29         breakatwhitespace=false,
  30         breaklines=true,
  31         captionpos=b,
  32         keepspaces=true,
  33         numbers=left,
  34         numbersep=2pt,
  35         showspaces=false,
  36         showstringspaces=false,
  37         showtabs=false,
  38         tabsize=1
  39 }
  40
  41 \lstset{style=mystyle}
  42 \geometry{top=2.5cm, bottom=2.5cm, left=2cm, right=2cm}
  43 \usepackage{fancyhdr}
  44 \pagestyle{fancy}
  45 \fancyhf{}
  46
  47 \lhead{}
  48 \rhead{A Smart NWN NPC}
  49 \fancypagestyle{noheadrule}{
  50         \fancyhf{}
  51         \renewcommand{\headrulewidth}{0pt}
  52 }
  53 \lfoot{Universidade de Évora}
  54 \rfoot{\thepage}
  55
  56 %-------------------------------------------------------------------------------------
  57 \title{A smart Neverwinter Nights NPC}
  58 \author{Vitor Gonçalo Costa (m70323)}
  59 \date{Universidade de Évora\\ \today}
  60 %-------------------------------------------------------------------------------------
  61 \begin{document}
  62         \maketitle
  63
  64         \abstract{This document aims to describe the process of empowering a video game non playable character with large language model capabilities that reacts to player sentences on an online multiplayer neverwinter nights game server.
  65         %\\\\ \textbf{summary:} Game development, AI, LLM, D\&D.
  66         }
  67
  68         \section{About the Game}
  69         \textit{Neverwinter Nights:Enhanced Edition} is a role play video game owned by the company Beamdog, which is devoted to maintain and improve it since 2015. But the game was released by Bioware in 2002, packaged with the \textit{Aurora Tool kit}, the same engine used by the company to develop the main campaign, which goes by the name \textit{The Wailling Death}, and following expansions, \textit{Shadow of Undrentide} in 2003 and \textit{Hordes od the Underdark} in 2003.
  70
  71         The \textit{Aurora Tool kit} allows players, or dungeon masters, to build their custom worlds and role play campaigns with the Dungeons \& Dragons 3.5 edition rules. Bioware facilitated a server application to open this custom worlds to the internet through UDP protocol, allowing up to 255 client connections per world. The community thrived to build custom content like 3D assets, game logic and other features to enhance the experience of home-brew D\&D campaigns. The engine was so customizable that companies were formed to build entire new games like \textit{Start Wars: Knights Of The Hold Republic} and \textit{The Witcher}.
  72
  73         \section{The infrastructure}
  74         This section aims to briefly describe the tools used to configure a local large language model that is reachable by a different "containerized" service, our game server in this case.
  75
  76         \begin{figure}[h]
  77                 \centering
  78                 \includegraphics[width=0.75\textwidth]{imagens/nwn-asl-server-arch-localhost.png}
  79                 \caption{Server configuration presented to enable llm functionality on a nwn server.}
  80                 \end{figure}
  81
  82         \subsection{Ollama}
  83         Ollama is an open source tool to train large language models like gpt-oss, mistral, llama3 and others made available by the community. Some of this models are available through the cloud, while others could be downloaded and configured locally. By default, after installation the service is accessible at \textbf{http://localhost:11434/}, with the path api/chat/ available to start a conversation.
  84
  85         \begin{verbatim}
  86         curl -X POST http://localhost:11434/api/chat \
  87         -H "Content-Type: application/json" \
  88         -d '{
  89                 "model": "llama3",
  90                 "messages": [
  91                 {
  92                         "role": "system",
  93                         "content": "You are Elrendur, a wise NPC that loves to talk about food recipies
  94                          from Alentejo. Keep your answer to one sentence."
  95                 },
  96                 {
  97                         "role": "user",
  98                         "content": "What can you tell me about this place?"
  99                 }
 100                 ],
 101                 "stream": false
 102         }'
 103         \end{verbatim}
 104
 105         The "messages" array is how we pass a conversational history. The system role sets the personality and rules, while the user role is the actual prompt from the player.
 106
 107         The JSON file answered by Ollama also contains a "message" attribute, from which we can read the following:
 108
 109         \begin{verbatim}
 110         Alentejo is a culinary paradise, where the rich flavors of olive oil and garlic
 111         harmonize with the warmth of sun-kissed bread and the simplicity of rustic
 112         ingredients to create dishes that are as satisfying as they are delicious!
 113         \end{verbatim}
 114
 115         There are other parameters available from the llm response, they can be consulted on Ollama documentation, the total\_duration and eval\_duration, could be used to understand how fast the graphics card is generating text.
 116
 117         \subsection{Middleware custom service}
 118         This service is built with a Python script to leverage the communication between the game server and the artificial intelligence large language model. The script is executed after making the docker images up, it listens to entries created on Redis, a noSQL database.
 119
 120         The middleware is envisioned so that it can evolve to support persistent world campaigns and game play experience by storing information about player characters, respond to world events, update npc attributes and other things limited only by the builder's imagination. In other words, it can have meta server logic, or it can serve other purposes to the D\&D campaign.
 121
 122         The first version of this service is incapable to provision two or more instances of Ollama to answer as many incoming messages from the players. Also, if two players send a message at the same time to the anpc (short for the agent npc, let's call it this way), the message will probably get mixed and jargon data is fed to the llm, outputting nonsense, probably. With this in mind, the middleware was updated to use \textit{asyncio} python library to spawn workers that interact with Ollama in separate threads and provide the answer back to nwn server. We have a variable, flag or semaphore that limits the amount of instances to process, if 20 players talk to anpc at the same time, the script grabs all 20, but only allows 3 to hit the graphics card simultaneously, the other 17 wait in RAM perfectly safely until a slot opens up.
 123
 124         Feel free to peak the code here. The module used in this project is also available on Neverwinter Vault website.
 125
 126         \subsection{Docker Images}
 127         The docker images used are available online, please have a look to the repository github.com/nwnxee for more information on how to set up a custom nwn server.
 128
 129         \subsubsection{NWNX:EE}
 130         From the documentation itself, "\textit{NWNX:EE is a framework that developers can use to modify existing hardcoded rules or inject brand new functionality into Neverwinter Nights: Enhanced edition}". We use scripts from this tool, developed by the community, to inject the messages from a player character to Redis - nwn\_redis and nwn\_redis\_lib to be more specific.
 131
 132         By taking advantage of the engine's module scripts that monitors player's chats, we can write one that triggers a message to the database if any of the players tries to speak to the anpc (please check the on\_player\_chat script available on the module). We perform similar logic when the return message is pushed to Redis by Ollama with ai\_heartbeat script, executed by the engine module's heartbeat event (more of aurora toolkit's lore can be found in http://palmergames.com/lexicon/).
 133
 134         Once everything is set up, a player can interact with the agent by passing a custom commmand !tell followed by the character first name (usually the objects name\_tag) and the message to send to the llm, like bellow:
 135
 136         \begin{verbatim}
 137                 !tell npc_name_tag Hello master dwarf, what do you have in your pockets?
 138         \end{verbatim}
 139
 140         \subsubsection{Redis}
 141         Redis is a noSQL database engine that has two "queues configured", nwn\_to\_llm and llm\_to\_nwn. This is the field used by the middleware and nwnx:ee to route message instances to the right application.
 142
 143         \section{Public Availability \& Security Considerations}
 144
 145         Transitioning the infrastructure from a localized testing environment to a public-facing multiplayer server introduces significant security paradigms that must be addressed. Exposing local hardware to the internet requires strict isolation of internal services and controlled network routing to ensure host safety and service availability.
 146
 147         \subsection{Isolation of Internal Database Services}
 148         The most critical vulnerability in the localized architecture involves the Redis container. By default, Docker binds published ports to the \texttt{0.0.0.0} interface, effectively exposing the database (ports 6379/6380) to the public internet if the host machine is directly connected. This would allow unauthenticated external actors to read chat queues, inject malicious JSON payloads into the engine, or perform arbitrary database commands.
 149
 150         To mitigate this, the Docker Compose configuration is modified to bind the Redis port strictly to the host's loopback address (\texttt{127.0.0.1:6380:6379}). This ensures the database, the Python middleware, and the Ollama API remain completely inaccessible from the outside world, acting solely as internal couriers.
 151
 152         \subsection{Network Routing and UDP Port Forwarding}
 153         Because the internal AI components are secured, public access is restricted entirely to the Neverwinter Nights engine. The game server utilizes the User Datagram Protocol (UDP) to handle client connections. To make the world publicly available, port forwarding rules must be configured on the host's network gateway (router) to route incoming UDP traffic on port 5121 directly to the host machine's local IPv4 address. All TCP traffic and unauthorized UDP ports remain dropped by the network firewall.
 154
 155         \subsection{Distributed Denial of Service (DDoS) Mitigation}
 156         Hosting a public server on consumer-grade ISP infrastructure exposes the host's public IP address to the player base. While the internal system is secure against data breaches, it remains vulnerable to Distributed Denial of Service (DDoS) attacks. A malicious actor could flood the host's IP with junk traffic, overwhelming the residential bandwidth and causing a total network outage.
 157
 158         For small-scale academic testing, this risk is acceptable. However, for a production-scale deployment with up to 255 players, the recommended mitigation strategy involves utilizing a Virtual Private Server (VPS) acting as a Reverse Proxy. Players connect to the public-facing VPS, which masks the host's true IP and securely tunnels the UDP traffic to the residential machine, allowing the cloud provider to absorb potential volumetric attacks.
 159
 160         \section{NWN Persistent World's LLM Implementations}
 161
 162         \subsection{Talk to a Shrine: Immersive Proximity Scanning}
 163
 164         A core objective of implementing Generative AI within a roleplaying environment is maintaining player immersion. Initial prototypes relied on global chat commands (e.g., \texttt{!tell <Target>}), which functioned technically but broke the "fourth wall" of the roleplay experience. To resolve this, an immersive, proximity-based listening system was engineered to allow players to interact with inanimate AI objects (such as statues or shrines) using natural, localized spatial chat.
 165
 166         \subsubsection{Overcoming Legacy Engine Constraints}
 167         A significant architectural challenge within the Aurora Engine is that ``Placeable'' objects (e.g., scenery, statues) do not natively possess an \texttt{OnConversation} event, a feature strictly reserved for Creature entities. Historically, builders circumvented this by spawning invisible creatures inside placeables to act as "ears." To avoid this unoptimized legacy practice, the implementation utilizes a highly efficient Proximity Scanner attached to the Module's global \texttt{OnPlayerChat} event.
 168
 169         \subsubsection{The Proximity Scanner Mechanism}
 170         When a player transmits a message, the server evaluates the chat volume. If the volume matches standard spatial speech (\texttt{TALKVOLUME\_TALK}), the engine performs a localized radial scan (up to 5.0 in-game meters) originating from the player's coordinates. The script iterates through nearby objects, checking for a specific Local String variable named \texttt{LLM\_PROMPT}.
 171
 172         This approach introduces a "Data-Driven Design" paradigm. World builders are no longer required to write or modify code to create new AI entities. They simply place a 3D asset in the Toolset and attach an \texttt{LLM\_PROMPT} string to it (e.g., \textit{"You are the Shrine of the Forgotten, speak in cryptic rhymes"}). If the proximity scanner detects an object with this variable, it designates it as the target Generative Agent.
 173
 174         \subsubsection{Dynamic Context Injection}
 175         Once a valid shrine is identified, the script harvests contextual data from the speaking player character. Biometric and moral attributes, specifically the character's Race (e.g., Elf, Dwarf) and Alignment (e.g., Good, Evil), are extracted using native NWScript functions.
 176
 177         This data is dynamically injected into a JSON payload alongside the player's message and the Shrine's localized \texttt{LLM\_PROMPT}. By passing this comprehensive context state to the asynchronous Python middleware, the LLM is empowered to generate highly personalized responses—such as a "Good" aligned shrine reacting with hostility toward an "Evil" aligned player—without requiring rigid, hardcoded conditional logic.
 178
 179         \subsubsection{Execution and Feedback}
 180         Finally, the serialized JSON payload is pushed to the Redis \texttt{nwn\_to\_llm} queue via the NWNX plugin. To complete the immersive feedback loop, the server immediately applies a localized visual and auditory effect (VFX) to the Shrine object, signaling to the player that their spatial audio was successfully "heard" by the entity while the external LLM processes the response.
 181
 182
 183         %\subsection{LLM-Driven Agents}
 184         \subsection{Autonomous Generative Agents: The Sense-Think-Act Architecture}
 185
 186         While static Shrines demonstrate the viability of contextual text generation, they remain physically inert. To fully realize the potential of Generative AI within a virtual environment, the architecture was expanded to support Autonomous Generative Agents—non-playable characters (NPCs) capable of spatial awareness, decision-making, and physical interaction with the game world. This was achieved by implementing a decoupled ''Sense-Think-Act'' loop.
 187
 188         \subsubsection{The Sense Phase: Optimized Environmental Polling}
 189         To inform the LLM's decision-making, the Agent must perceive its surroundings. This is handled by a custom script attached to the NPC's native \texttt{OnHeartbeat} event. To prevent computational bottlenecking and database saturation (DDoS-ing the local Redis instance), a strict ''Sleep/Wake'' optimization was engineered.
 190
 191         The Agent remains computationally dormant unless a player enters a 20-meter proximity radius. Once awakened, the Agent polls the engine for environmental data—such as its current Hit Points, the time of day, and the identities of nearby entities—but restricts database queries to a throttled tick-rate (e.g., once every 30 seconds). This state data is serialized alongside the Agent's foundational \texttt{LLM\_PROMPT} and pushed to the middleware.
 192
 193         \subsubsection{The Think Phase: JSON Schema Enforcement}
 194         A critical vulnerability in LLM integrations is ''hallucination,'' where the model outputs conversational prose instead of machine-readable commands. To bridge the gap between stochastic text generation and deterministic game logic, the Python middleware explicitly forces the Ollama API into a strict JSON formatting mode.
 195
 196         The LLM is instructed to evaluate the environmental payload and output a specific JSON schema containing four keys: \texttt{thought} (internal reasoning), \texttt{speech} (outward dialogue), \texttt{action} (a macro-directive), and \texttt{action\_target} (the object of the directive). This structural enforcement guarantees that the game engine receives predictable, parseable data.
 197
 198         \subsubsection{The Act Phase: Macro-Directives vs. Micro-Actions}
 199         A common pitfall in generative game AI is attempting to force the LLM to handle real-time pathfinding or physics calculations, which results in erratic behavior. This architecture solves this by delegating responsibilities: the LLM dictates ''Macro-Directives'' (the \textit{what}), while the Aurora Engine executes the ''Micro-Actions'' (the \textit{how}).
 200
 201         The Module's \texttt{OnHeartbeat} receiver script polls the \texttt{llm\_to\_nwn} Redis queue and parses the incoming JSON. If the LLM assigns the \texttt{INTERACT} action, the engine translates this into native NWScript functions (\texttt{ActionMoveToObject}, \texttt{SetFacingPoint}) to smoothly navigate the NPC toward the target. Compatibility shields were also programmed into the receiver to gracefully handle malformed data, defaulting to a heuristic wander state if the LLM output fails validation.
 202
 203         Ultimately, this architecture produces an illusion of profound intelligence, enabling NPCs to dynamically patrol, guard, converse, and interact based entirely on generative reasoning, while consuming minimal overhead on the primary game server loop.
 204
 205         \subsubsection{Semantic Translation and Fuzzy Logic Matching}
 206
 207         A recurring challenge when bridging Large Language Models with deterministic legacy engines is semantic generalization. While the Aurora Engine requires exact string matches to identify objects (e.g., a placeable named "Wooden Stool"), an 8-Billion parameter LLM will frequently output synonyms based on its vast training data, returning targets such as "Chair" or "Seat" instead. If the engine performs a literal evaluation, the action silently fails, breaking the immersion.
 208
 209         To resolve this without penalizing the AI's natural language capabilities, a semantic translation layer—or Synonym Dictionary—was implemented directly within the \texttt{ai\_receiver} script. Before the engine searches for the target, it sanitizes the LLM's output using lowercase conversions and fuzzy substring matching.
 210
 211
 212         This approach creates a fault-tolerant bridge: the LLM is free to reason contextually (e.g., "I am thirsty, I will grab a drink"), and the engine dynamically maps "drink" to the nearest "Wine Cup" placeable, ensuring the action queue never freezes.
 213
 214         \subsubsection{Smart Animation Heuristics for Environmental Immersion}
 215
 216         To maintain server performance and prevent players from accidentally opening the inventories of decorative objects, tavern placeables (like kegs, plates of food, and stools) are flagged as non-useable in the Aurora Toolset. However, this engine limitation strips the objects of their default interaction animations, causing the Generative Agent to simply stand inert next to the object when executing a \texttt{USE\_OBJECT} directive.
 217
 218         To restore visual immersion, a "Smart Animation Guesser" was engineered into the receiver script. By evaluating the same fuzzy string matched in the previous step, the engine heuristically determines the physical nature of the object and triggers the appropriate native Bioware animation.
 219
 220         This heuristic mapping allows a single \texttt{USE\_OBJECT} macro-directive to branch into highly specific, visually distinct actions. The Generative Agent seamlessly transitions from raising an invisible mug to his lips, to sitting down to rest, entirely driven by the localized context of the 3D assets placed by the world builder.
 221
 222         \subsection{Scaling the Ecosystem – Multi-Agent Architecture and World Vividness}
 223         This chapter details the transition from a single AI-driven NPC to a fully populated, reactive world. By decoupling prompts, integrating native engine data, and mixing "Heavy" AI with "Lite" background actors, we can create a cinematic environment without overloading the local GPU.
 224
 225         \subsubsection{Component-Based Prompt Architecture}
 226         To make world-building efficient in the Aurora Toolset, monolithic AI prompts are retired in favor of a decoupled, variable-based system.
 227
 228         The Blueprint: Instead of writing a massive JSON template for every NPC, builders define modular local strings on the character sheet: LLM\_PERSONA, LLM\_PROFESSION, LLM\_MOOD, LLM\_SECRET, and LLM\_ROUTINE.
 229
 230         The Prompt Compiler: The Python middleware (redis\_bridge.py) acts as a compiler. It intercepts these modular variables from Redis, stitches them together into a cohesive narrative context, and automatically appends the strict JSON formatting rules before sending the payload to the LLM. This abstracts the prompt engineering away from the game builder.
 231
 232         \subsubsection{Native Engine Integration (Smart Context)}
 233         To further reduce the builder's workload, the NWScript architecture actively pulls native game data and feeds it to the AI.
 234
 235         \textbf{Automatic Trait Extraction:} The ai\_receiver script translates the engine’s internal integer constants for Alignment, Race, and Gender into English strings (e.g., "Lawful Neutral Dwarf").
 236
 237         \textbf{        Seamless Roleplay:} This data is injected into the Prompt Compiler. The LLM organically knows the NPC's physical and moral traits without the builder ever having to type them out, ensuring consistent, lore-accurate roleplay.
 238
 239         \subsubsection{Dynamic Emotional Expressions}
 240         To prevent NPCs from feeling like static chat-bots, the JSON payload template is expanded to include a fifth key: "emotion".
 241
 242         \textbf{AI-Driven Directives:} The LLM is instructed to output a specific emotional state based on its generated dialogue (e.g., ANGRY, LAUGHING, BOW, TAUNT).
 243
 244         \textbf{Engine Translation:} The NWScript ai\_receiver parses this emotion key and executes the corresponding ActionPlayAnimation command on a 0.2-second delay. This synchronizes the NPC's physical body language perfectly with the appearance of their chat bubble.
 245
 246         \subsubsection{The "Python Bouncer" (Sanitization \& Stability)}
 247         Relying on smaller, local LLMs (like 8B parameter models) introduces the risk of "hallucinations" or broken JSON outputs, especially when multiple agents interact.
 248
 249         \textbf{Temperature Control:} The Ollama API request is set to a low Temperature (e.g., 0.2) to force logical, formatted outputs over unconstrained creativity.
 250
 251         \textbf{JSON Sanitization:} Before pushing the AI's response back to the game engine, Python validates the JSON structure. It forces keys to lowercase, strips stray punctuation from action targets, and explicitly checks for the existence of all required keys.
 252
 253         \textbf{The Anti-Silence Override:} If the AI attempts to output empty speech (giving the player the silent treatment), Python intercepts it and injects a physical sound (e.g., *grunts quietly*) to ensure the game engine's animation and action loops continue firing normally.
 254
 255         \subsubsection{Heavy vs. Lite Agents (GPU Optimization)}
 256         Processing every single town resident through a local GPU is mathematically impossible for consumer hardware. The solution is a hybrid ecosystem.
 257
 258         "Heavy" Agents: Key NPCs (guards, villains, quest givers) possess the LLM\_PERSONA variable and utilize the full Python/Redis async bridge, generating complex thoughts, actions, and environmental awareness.
 259
 260         "Lite" Agents: Background extras (washerwomen, drunks) do not use the LLM. Instead, they use a custom lite\_on\_convo script coupled with an omnipresent Listen Pattern (**). When a Heavy Agent or Player speaks nearby, the Lite Agent instantly fires a SpeakString reply from a pre-defined Toolset variable without interrupting their walking path.
 261
 262         The Hybrid Shopkeeper: Store owners utilize a dual-state script. If clicked by a player, they open a standard Bioware store menu. If spoken to by a nearby AI, they throw a floating text bubble. This maintains core RPG gameplay loops while contributing to the ambient noise of the city.
 263
 264         \subsubsection{Advanced Pathfinding and Cross-Area Navigation}
 265         For Heavy Agents to utilize the GO\_TO and RETURN\_TO\_POST commands across vast city maps, specific Toolset rules must be followed:
 266
 267         Standard Transitions: Doors and area boundaries must use standard, linked Toolset transitions, not custom jump scripts.
 268
 269         Door Logic: NPCs must have the nw\_c2\_defaulte script assigned to their OnBlocked event so they know how to open doors, and key doors must remain unlocked (or the NPC must carry the specific key item).
 270
 271         The Teleport Fallback: If an NPC's pathfinding math times out due to immense distance, the ai\_receiver script employs a delayed ActionJumpToObject safety net, ensuring agents always reach their destinations off-screen if they get stuck.
 272
 273         %\subsection{Knowledge Injection: Teaching the AI world Lore}
 274         %RAG (Retrieval-Augmented Generation), or for smaller amounts of lore, just Dynamic Prompting. Llama 3 already knows 99% of D&D 3.5e rules because the SRD (System Reference Document) was in its training data. We just need to give it your custom "Alentejo Sem Lei" lore.
 275
 276
 277         \subsection{Knowledge Injection: Teaching the AI world Lore}
 278         As noted in the project's foundational goals, the underlying LLM (such as Llama 3) already possesses extensive knowledge of D\&D 3.5 edition rules due to the System Reference Document (SRD) existing within its training data. However, the model requires specific context for the custom "Alentejo Sem Lei" setting. To achieve this without relying on complex Retrieval-Augmented Generation (RAG) pipelines, the Python middleware was upgraded to read an external lore document (asl\_lore.txt) upon initialization. This global world state is dynamically injected into the system prompt of every agent, ensuring their generated responses are grounded in the server's specific narrative and local rumors.
 279
 280
 281         \subsection{Biological Self-Awareness and Native D\&D Mechanics}
 282         While the proximity scanner initially harvested contextual data from the speaking player character—specifically biometric and moral attributes like Race and Alignment —the architecture was expanded to grant the Generative Agents biological self-awareness. During the Sense Phase's environmental polling, the engine now extracts the NPC's current Hit Points compared to their maximum capacity. This physical state is injected into the JSON payload alongside the player's message. By integrating this self-awareness with new macro-directives, the LLM can autonomously output commands like REST when severely injured, or use STEALTH and SEARCH to leverage native engine mechanics without explicit player instruction.
 283
 284         \subsection{Multi-Agent Group Conversations}
 285         The proximity scanner mechanism originally functioned by iterating through nearby objects within a 5.0 in-game meter radius  and terminating the loop upon finding the first valid Generative Agent. To simulate realistic group dynamics, this early termination was removed. The scanner now broadcasts the contextual JSON payload  to every valid agent within earshot. The Python middleware's asynchronous concurrency safely handles the simultaneous requests, resulting in multiple NPCs evaluating the prompt and responding sequentially, creating natural, multi-agent RP scenarios.
 286
 287
 288         \subsection{Dynamic Combat De-escalation (The Peace Macro)}
 289         To further bridge stochastic text generation with deterministic game logic, the receiver script was refactored to handle the complexities of the Aurora Engine's native faction and hostility systems. Initially, active combat states prevented the agent from reading incoming JSON payloads. The architecture was updated to parse the LLM's response prior to evaluating the engine's combat state, allowing the AI to "hear" player input mid-fight. Furthermore, a PEACE macro-directive was introduced. If a player successfully apologizes to an aggressive agent, the LLM outputs this macro, prompting the engine to clear personal reputations across all connected players and execute a native surrender command, allowing for natural language de-escalation of combat.
 290
 291
 292         \subsection{Psychological State Injection: The Hostility Override}
 293         A recurring anomaly in early combat testing involved "Goldfish AI" behavior, wherein a Generative Agent would maintain a polite, conversational demeanor even while actively being attacked by a player. This occurred because the LLM lacks native visibility into the Aurora Engine's internal faction and reputation matrices, relying entirely on its baseline prompt instructions.
 294
 295         To resolve this cognitive dissonance, a Psychological Hostility Override was implemented within the chat interception pipeline. When a player interacts with an Agent, the engine evaluates their real-time relationship status using the native \texttt{GetIsEnemy()} function. If the engine determines the player is actively flagged as hostile (due to ongoing combat or an unresolved grudge), a critical psychological override string is injected into the JSON payload (e.g., \textit{"CRITICAL: THIS PLAYER IS YOUR ENEMY! They have attacked you. You are FURIOUS."}).
 296
 297         The Python middleware dynamically updates the LLM's system prompt with this relationship state. Consequently, the LLM drops its default persona and generates highly aggressive, threatening dialogue and hostile macro-directives, ensuring the Agent's textual output perfectly mirrors the engine's deterministic combat state.
 298
 299
 300         \subsection{Physical Emote Extraction and Non-Verbal Communication}
 301         Traditional text-based interactions with Generative AI often limit player expression strictly to verbal dialogue. However, in a 3D graphical environment, non-verbal communication and body language are crucial for deep roleplay immersion. To capture this physical context, the chat interception pipeline was expanded to include a Physical Emote Extraction system.
 302
 303         When a player utilizes specific slash commands in the chat (such as \texttt{/bow}, \texttt{/taunt}, or \texttt{/cheer}), the engine's \texttt{OnPlayerChat} script intercepts the input before it broadcasts to the server. The script executes two simultaneous operations. First, it forces the player's 3D avatar to play the corresponding native Bioware animation, anchoring the action in the visual game world. Second, it clears the text message to prevent UI clutter and translates the physical action into a semantic descriptive string (e.g., \textit{"[The player physically bows respectfully to you]"}).
 304
 305         This descriptive string is immediately appended to the \texttt{player\_state} context variable within the JSON payload. Consequently, the Generative Agent perceives the purely physical interaction alongside the player's biometric data. This allows players to silently provoke, greet, or surrender to an NPC, and the LLM will generate appropriate thoughts, emotions, and macro-directives in response to the body language alone.
 306
 307         \subsection{Geographic Awareness and Spatial Context Injection}
 308         A critical element in producing a believable Generative Agent is spatial awareness. Without geographic context, an LLM defaults to generic, location-agnostic dialogue. To ground the agents within the specific environments of the persistent world, the environmental radar system was upgraded to extract and inject spatial data using a dual-layered approach within the Aurora Engine.
 309
 310         The implementation leverages the engine's Local Variables system to attach descriptive string variables (\texttt{LLM\_LOCATION\_CONTEXT}) to spatial objects. This occurs at two levels of granularity:
 311         \begin{enumerate}
 312                 \item \textbf{Broad Context (Area-Level):} Variables attached directly to the global Area object define the general atmosphere and macro-location (e.g., \textit{"You are in the City Docks. It smells of salt and rotting fish."}).
 313                 \item \textbf{Granular Context (Waypoint-Level):} To provide hyper-local awareness, builders can place custom, invisible waypoints (tagged \texttt{WP\_LLM\_LORE}) near specific points of interest. These waypoints hold variables describing immediate surroundings (e.g., \textit{"You are standing next to a smuggler's skiff."}).
 314         \end{enumerate}
 315
 316         During the Agent's routine \texttt{OnHeartbeat} polling cycle, the native \texttt{GetArea()} function extracts the broad context, while a spatial proximity scan (\texttt{GetNearestObjectByTag} within a 15-meter radius) extracts any granular waypoint context. These strings are concatenated and appended to the serialized JSON payload under a new \texttt{location\_context} key.
 317
 318         Upon receiving the payload, the Python middleware injects this spatial data directly into the LLM's dynamic system prompt. Consequently, when a player interacts with the Agent, the LLM incorporates its physical surroundings into its reasoning and dialogue, enabling dynamic, location-specific roleplay without requiring custom scripts for individual areas.
 319
 320         \subsection{Hierarchical Swarm Intelligence: The Commander Agent}
 321         A fundamental limitation of integrating Large Language Models into real-time multiplayer environments is computational overhead. Provisioning a dedicated LLM inference thread for every individual hostile creature within a dungeon encounter would result in severe latency and catastrophic hardware bottlenecking. To achieve large-scale, intelligent combat encounters without overwhelming the host GPU, the architecture utilizes a "Director AI" paradigm, implemented here as a Hierarchical Swarm Intelligence.
 322
 323         Instead of granting generative autonomy to every entity, the cognitive load is centralized. Only the faction leader—designated as the "Villain" or "Commander"—is equipped with an \texttt{LLM\_PERSONA} variable and access to the Sense-Think-Act loop. The subordinate minions continue to operate on the Aurora Engine's highly optimized, native C++ combat heuristics.
 324
 325         \subsubsection{The COMMAND Macro and Tactical Override}
 326         To bridge the cognitive gap between the LLM Commander and the native engine minions, a specialized \texttt{COMMAND} macro-directive was engineered. During the Think Phase, the Commander evaluates the dynamic \texttt{player\_state} context (e.g., identifying a physically weak spellcaster dealing massive damage). If a tactical adjustment is required, the LLM outputs the \texttt{COMMAND} action paired with a specific target parameter.
 327
 328         These parameters include explicit player names for "Focus Fire" tactics, or broad operational directives such as \texttt{RETREAT} or \texttt{DEFEND\_ME}.
 329
 330         \subsubsection{Engine-Side Execution and Swarm Control}
 331         When the receiver script intercepts a \texttt{COMMAND} macro, it acts as a localized broadcast beacon. The engine executes a radial sweep to identify all creatures sharing a faction with the Commander (\texttt{GetFactionEqual}). Upon identifying valid allied minions, the script forcefully clears their native combat action queues (\texttt{ClearAllActions}) and injects the LLM's tactical directive.
 332
 333         Depending on the targeted parameter, the engine utilizes native NWScript functions to execute the maneuver:
 334         \begin{itemize}
 335                 \item \textbf{Focus Fire:} Minions are commanded via \texttt{ActionAttack} to bypass heavily armored players and swarm the designated high-value target.
 336                 \item \textbf{Retreat:} Minions execute \texttt{ActionMoveAwayFromObject}, breaking line of sight to regroup.
 337                 \item \textbf{Defend:} Minions utilize \texttt{ActionForceFollowObject} to form a defensive perimeter around the Commander.
 338         \end{itemize}
 339
 340         This hierarchical approach yields profound emergent gameplay. The computational cost remains strictly at one LLM request per encounter, yet the entire swarm exhibits highly coordinated, dynamic tactics that adapt to player behavior in real-time. Furthermore, this systemic integration means that if players successfully utilize the \texttt{PEACE} macro to negotiate a surrender with the Commander, the native engine hierarchy ensures all subordinate minions immediately cease hostilities as well.
 341
 342
 343         \section{Advanced Cognitive Architectures: Vector Databases and Memory}
 344         As the complexity of the Generative Agents scaled, injecting static global lore and maintaining conversational history within the LLM's system prompt introduced severe token bloat. This resulted in hardware bottlenecking, increased latency (Time-to-First-Token), and context window exhaustion. To achieve highly performant, long-term cognitive depth, the architecture was upgraded to include a local Vector Database (\texttt{ChromaDB}) functioning independently alongside the Redis message broker.
 345
 346         \subsection{Retrieval-Augmented Generation (RAG) for Dynamic World Lore}
 347         Rather than forcefully injecting the entire campaign lore document into every LLM request, the system utilizes a Retrieval-Augmented Generation (RAG) pipeline. Upon initialization, the Python middleware ingests a globally curated Markdown file (\texttt{asl\_lore.md}). The system parses the document hierarchically, converts the paragraphs into mathematical vector embeddings, and stores them in a dedicated \texttt{ChromaDB} collection.
 348
 349         During live gameplay, when a player speaks to an Agent, the middleware performs a semantic similarity search using the player's text and the Agent's geographic \texttt{location\_context}. The database retrieves only the top-most relevant paragraphs (e.g., pulling rumors specific to the "City Docks" when the conversation occurs there) in under 20 milliseconds. This hyper-specific, localized lore is then dynamically injected into the LLM's prompt, granting the NPCs vast, encyclopedic knowledge of the game world while utilizing a fraction of the computational tokens.
 350
 351         \subsection{Asynchronous Episodic Memory}
 352         To simulate persistent, long-term relationships between players and NPCs across server resets, an Asynchronous Episodic Memory system was engineered. Operating within the strict constraints of real-time multiplayer latency, the system decouples memory processing from the live conversational loop.
 353
 354         The middleware maintains a sliding context window of the ten most recent messages for any given interaction. When this threshold is exceeded, the oldest messages are extracted and pushed to a low-priority asynchronous background queue. While the player continues interacting with the game world seamlessly, a background Python worker prompts a lightweight LLM inference to summarize the extracted chat log into a brief, past-tense memory.
 355
 356         This summary is embedded and stored in a secondary \texttt{ChromaDB} collection, uniquely indexed by a composite \texttt{session\_id} containing the Player's name and the NPC's tag. During future interactions—even weeks later—the RAG pipeline queries this collection, allowing the Generative Agent to autonomously recall past alliances, grudges, shared secrets, and specific conversational nuances without degrading server performance.
 357
 358
 359         \subsection{Decoupled Cognitive Archetypes: A Multi-Strategy Architecture}
 360         As the Generative Agent population scaled within the persistent world, relying on a monolithic system prompt for all entities introduced significant behavioral anomalies. Ambient townspeople would occasionally attempt to execute combat macros, while hostile combatants would waste processing tokens attempting to initiate casual dialogue mid-battle. Furthermore, a unified prompt restricted the ability to assign specialized engine-level capabilities (e.g., opening merchant stores or granting quests) without risking hallucinated executions by unqualified NPCs.
 361
 362         To resolve these logical conflicts and optimize token utilization, the AI architecture was refactored into a decoupled, modular state-machine. Entities within the engine are now assigned a discrete \texttt{LLM\_STRATEGY} integer variable, categorizing them into one of four distinct cognitive archetypes:
 363
 364         \begin{enumerate}
 365                 \item \textbf{Strategy 1: The Autonomous Agent (Interactive/Economic):} Player-centric entities designed for rich dialogue, environmental interaction, and economic utility. These agents possess unique engine hooks allowing them to trigger native User Interface elements, such as opening merchant inventories (\texttt{OPEN\_STORE}) or interacting with the server's quest journaling system (\texttt{GIVE\_QUEST}).
 366                 \item \textbf{Strategy 2: The Villain Commander (Tactical/Hostile):} Combat-oriented entities designed to provide dynamic, asymmetrical encounters. Their cognitive loop prioritizes threat assessment, allowing them to issue tactical directives (\texttt{COMMAND}, \texttt{RETREAT}) to lesser, non-generative minions, or utilize the \texttt{PEACE} macro to dynamically surrender based on their biological self-awareness (\texttt{npc\_health}).
 367                 \item \textbf{Strategy 3: The Maestro (Ambient Puppeteer):} A highly optimized strategy designed solely for environmental immersion. The Maestro entirely ignores player character inputs and proximity triggers. Instead, it utilizes the aforementioned "Puppeteer Method" to continuously scan for and converse with non-LLM "dumb" NPCs, simulating a vibrant, living ecosystem at a fraction of the computational cost.
 368                 \item \textbf{Strategy 4: The Shrine (Environmental Narrative):} Inanimate or stationary objects (e.g., magical statues, talking doors) that lack mobility macros (\texttt{WANDER}, \texttt{PATROL}) but possess cryptic, condition-based dialogue trees capable of granting specialized quests.
 369         \end{enumerate}
 370
 371         \subsubsection{Dynamic Prompt Compilation and Scope Restriction}
 372         Within the Python asynchronous middleware, the \texttt{LLM\_STRATEGY} flag dictates the real-time compilation of the LLM's system prompt. Rather than feeding the model a static list of all possible engine macros, the middleware dynamically injects a strictly filtered \texttt{action\_macros} string and a customized \texttt{target\_context}.
 373
 374         For example, if the strategy integer is 3 (The Maestro), the prompt is explicitly instructed: \textit{"You are ignoring players and focusing on ambient life. Do not address players."} Simultaneously, combat macros like \texttt{ATTACK} are omitted from its allowed output schema. This strict prompt-level scope restriction guarantees zero-shot adherence to the NPC's assigned mechanical role, entirely eliminating cross-strategy hallucinations.
 375
 376         \subsubsection{Event-Driven Asynchronous Overrides}
 377         To accommodate the varying urgency of these strategies, the native engine's "Sense" hooks were heavily modified. While ambient observation relies on a strict 60-second heartbeat throttle to conserve server CPU cycles, tactical entities (Strategy 1 and 2) require immediate reaction to physical threats.
 378
 379         A dedicated asynchronous override was implemented via the engine's native \texttt{OnDamaged} and \texttt{OnPhysicalAttacked} events. When an agent sustains damage, the engine bypasses the heartbeat throttle and instantly pushes a "Pain Trigger" payload to the Redis queue (e.g., \textit{"SYSTEM CRITICAL: You were just physically attacked..."}). This forces the LLM to immediately interrupt its current cognitive task and generate a real-time tactical or conversational response to the aggression, ensuring the AI feels remarkably responsive during emergent gameplay.
 380
 381         %\newpage
 382         %---------------------------------------------------
 383         \section{System Performance and Resource Evaluation}
 384
 385         To validate the viability of integrating Generative AI into a real-time multiplayer environment, a comprehensive performance analysis was conducted. The evaluation focuses on three critical vectors: player-perceived latency, middleware concurrency management, and hardware resource utilization.
 386
 387         \subsection{Latency and Response Time Analysis}
 388         In traditional multiplayer architectures, server tick-rate and network latency are measured in milliseconds. However, LLM inference introduces significant processing delays. To quantify this, the "Round Trip Time" (RTT) was measured—defined as the moment a player transmits a chat message to the moment the generative agent executes an action or speaks.
 389
 390         The RTT is composed of network routing, Redis queuing, and the AI inference duration. By extracting the \texttt{total\_duration} parameter from the Ollama API JSON response (measured natively in nanoseconds), we can isolate the inference bottleneck.
 391
 392         \begin{table}[h!]
 393                 \centering
 394                 \begin{tabular}{|l|c|c|c|}
 395                         \hline
 396                         \textbf{Model Parameter Size} & \textbf{Average RTT (s)} & \textbf{Inference Time (s)} & \textbf{Engine Overhead (ms)} \\ \hline
 397                         Llama 3 (8B) - 4-bit Quant.   & [Insert Data]            & [Insert Data]               & $\sim 15$ ms                  \\ \hline
 398                         % Add more rows if you test other models like Mistral or Gemma
 399                 \end{tabular}
 400                 \caption{Average Latency Metrics for a Single Agent Request}
 401                 \label{tab:latency}
 402         \end{table}
 403
 404         The data demonstrates that while the legacy Aurora Engine and the asynchronous Python middleware process data in mere milliseconds, the LLM inference acts as the primary bottleneck, validating the necessity of decoupling the AI generation from the game's primary execution thread.
 405
 406         \subsection{Concurrency Limits and Asynchronous Queuing}
 407         A Persistent World server must handle multiple concurrent events. The middleware was designed with an \texttt{asyncio.Semaphore} mechanism to act as a localized load balancer, preventing GPU memory overflow by capping concurrent inference tasks at $N=3$.
 408
 409         To test the efficacy of this queuing system, burst-load stress tests were conducted by simulating 1, 5, and 10 simultaneous player interactions.
 410
 411         %\begin{figure}[h!]
 412         %       \centering
 413                 % You will replace this with an actual image of your graph later
 414                 % \includegraphics[width=0.8\textwidth]{concurrency_graph.png}
 415         %       \vspace{4cm} % Placeholder space
 416         %       \caption{Response time degradation under concurrent load bursts.}
 417         %       \label{fig:concurrency}
 418         %\end{figure}
 419
 420         %Under a load of 5 simultaneous requests, the first 3 requests are processed immediately by the GPU, while the remaining 2 are held safely in system RAM by the Python event loop. The graph (Figure \ref{fig:concurrency}) illustrates a staircase effect in response times: the final requests in a 10-burst payload experience significantly higher latency as they wait for the semaphore to release, though the server hardware remains entirely stable without crashing.
 421
 422         \subsection{Hardware Resource Utilization}
 423         Hosting local LLMs requires substantial graphical processing power. The host machine utilized for this evaluation features an NVIDIA RTX 5060 Ti GPU. Monitoring tools were deployed during the concurrency stress tests to record Video RAM (VRAM) allocation and thermal output.
 424
 425         \begin{itemize}
 426                 \item \textbf{VRAM Allocation:} The Llama 3 8-Billion parameter model, when loaded into memory, consumes approximately [Insert GB] of VRAM. During peak concurrent loads, VRAM spikes to [Insert GB] to handle context windows, staying safely below the GPU's maximum capacity.
 427                 \item \textbf{Thermal Dynamics:} At an idle state, the GPU operates at [Insert Temp]$^{\circ}$C. During a sustained 10-request burst load, the temperature peaks at [Insert Temp]$^{\circ}$C, indicating that the semaphore cap effectively prevents thermal throttling.
 428         \end{itemize}
 429
 430         Ultimately, the resource evaluation proves that consumer-grade hardware is highly capable of running a decentralized, AI-driven Persistent World, provided that strict middleware traffic orchestration is implemented.
 431
 432         %---------------------------------------------------
 433
 434         \section{Future Work: Blockchain Integration and Monetization}
 435         The successful implementation of an asynchronous middleware architecture (via Python and Redis) to connect the Aurora Engine to external AI models opens a pathway for further external service integrations. A highly relevant avenue for future development is the exploration of Web3 technologies, specifically the integration of cryptocurrencies and blockchain mechanics to establish a decentralized player economy.
 436
 437         \subsection{Strategic Implementation Plan for Web3 Gaming}
 438
 439         Integrating a digital economy into a legacy Persistent World requires a phased approach to maintain gameplay balance while introducing real-world value. The existing middleware is perfectly positioned to act as the "Oracle" between the Neverwinter Nights server and a blockchain network (such as Polygon or Arbitrum, chosen for low transaction fees).
 440
 441         \subsubsection{Phase 1: In-Game Currency Tokenization (ERC-20)}
 442         The foundational step involves mapping the in-game currency (Gold Pieces) to a custom cryptocurrency token (e.g., an ERC-20 token named \$ASL).
 443         \begin{itemize}
 444                 \item \textbf{The Faucet Mechanism:} Players earn \$ASL by participating in the game world—completing LLM-generated quests, defeating bosses, or roleplaying. The Python middleware listens for these game events via Redis and triggers a smart contract payout to the player's connected digital wallet.
 445                 \item \textbf{The Sink Mechanism:} Players can spend \$ASL to purchase premium server features, such as cosmetic visual effects, custom player housing, or the ability to mint new AI Shrines in the world.
 446         \end{itemize}
 447
 448         \subsubsection{Phase 2: Digital Asset Ownership (NFTs / ERC-721)}
 449         Neverwinter Nights features a robust crafting and loot system. Future iterations of the server could tokenize unique, high-tier game items as Non-Fungible Tokens (NFTs).
 450         If a player discovers a legendary sword, the Python bridge interacts with the blockchain to mint an NFT representing that exact weapon, binding its unique stats and history to the token. Players could then freely trade or sell these items on external decentralized marketplaces (like OpenSea) for real-world currency, allowing the server host to collect a small royalty percentage on every transaction.
 451
 452         \subsubsection{Phase 3: Decentralized Governance (DAO)}
 453         For a heavy roleplay server, community management is critical. Tokenizing the server allows for the creation of a Decentralized Autonomous Organization (DAO). Players holding the server's token could securely vote on proposed changes to the game rules, lore directions, or even vote on which AI Personalities should be permanently added to the city guards. This creates a deeply invested, player-driven ecosystem.
 454
 455         \subsection{Architectural Advantages of the Existing Middleware}
 456         The asynchronous \texttt{aiohttp} and Redis architecture developed for the Generative Agents is inherently suited for blockchain integration. In the same way the Python script currently sends JSON payloads to the Ollama API, it can utilize libraries such as \texttt{web3.py} to securely sign and transmit transaction data to a blockchain RPC node. The legacy C++ Aurora Engine remains entirely isolated from the cryptographic complexity, simply receiving a Boolean confirmation from Redis once a blockchain transaction clears.
 457
 458         \subsection{Academic and Design Challenges}
 459         While technically feasible, monetizing a Persistent World via cryptocurrencies introduces significant challenges that must be evaluated:
 460         \begin{itemize}
 461                 \item \textbf{Economic Balancing:} The risk of hyperinflation if token generation (faucets) outpaces token spending (sinks).
 462                 \item \textbf{Pay-to-Win Dynamics:} Careful game design is required to ensure that purchasing tokens with real money does not ruin the competitive integrity or roleplay immersion of the server.
 463                 \item \textbf{Regulatory Compliance:} Navigating the legal landscape of distributing digital assets and preventing money laundering within a video game ecosystem.
 464         \end{itemize}
 465
 466
 467 \end{document}